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The analysis of data from gravitational wave detectors can be divided into three phases: search, 
characterization, and evaluation. The evaluation of the detection - determining whether a candidate 
event is astrophysical in origin or some artifact created by instrument noise - is a crucial step in 
the analysis. The on-going analyses of data from ground based detectors employ a frequentist 
approach to the detection problem. A detection statistic is chosen, for which background levels and 
detection efficiencies are estimated from Monte Carlo studies. This approach frames the detection 
problem in terms of an infinite collection of trials, with the actual measurement corresponding to 
some realization of this hypothetical set. Here we explore an alternative, Bayesian approach to the 
detection problem, that considers prior information and the actual data in hand. Our particular 
focus is on the computational techniques used to implement the Bayesian analysis. We find that the 
Parallel Tempered Markov Chain Monte Carlo (PTMCMC) algorithm is able to address all three 
phases of the anaylsis in a coherent framework. The signals are found by locating the posterior 
modes, the model parameters are characterized by mapping out the joint posterior distribution, and 
finally, the model evidence is computed by thermodynamic integration. As a demonstration, we 
consider the detection problem of selecting between models describing the data as instrument noise, 
or instrument noise plus the signal from a single compact galactic binary. The evidence ratios, or 
Bayes factors, computed by the PTMCMC algorithm are found to be in close agreement with those 
computed using a Reversible Jump Markov Chain Monte Carlo algorithm. 
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I. INTRODUCTION 

The first direct detection of gravitational waves (GWs) 
is now within reach, barkening the beginning of a new 
field of astronomy. The Laser Interferometer Gravita- 
tional wave Observatory (LIGO) [1, 2] has recently com- 
pleted acquiring one year of triple-coincidence data at 
design sensitivities (S5) and analysis efforts are in full 
swing [3]. 

Upgrades to the LIGO and Virgo [4] detectors are near- 
ing completion in preparation for the S6/VSR2 data col- 
lection period, paving the way for the advanced genera- 
tion of observatories and routine gravitational wave de- 
tection. Plans for the space based Laser Interferometer 
Space Antenna (LISA) [5] continue to mature. All key 
LISA technologies are now in place and will soon be flown 
on the LISA Pathfinder [6] mission. While hardware de- 
velopments have brought the first detection within reach, 
exploting the full potential of these instruments requires 
similar improvements in the techniques used to analyse 
the data. 

The ability to determine whether data contains just 
instrument noise, or noise plus a resolvable gravitational 
wave signal, is the central problem in gravitational wave 
astronomy. The "detection problem" can be divided into 
three stages: 

• Search: Determine the region(s) in parameter space 
containing the maximum posterior weight to estab- 
lish a candidate gravitational wave signal. 

• Characterization: Establish confidence intervals for 
the parameters that describe the signal model and 
the noise model. 



• Evaluation: The detection hypothesis is compared 
to alternatives through model selection. 

Here we study a fully Bayesian approach to the detec- 
tion problem. All three stages of the analysis are handled 
by a single technique - the Parallel Tempered Markov 
Chain Monte Carlo (PTMCMC) algorithm. The PTM- 
CMC algorithm establishes which regions of parameter 
space contain the highest posterior weight, efficiently ex- 
plores the posterior distribution function (PDF) of the 
model parameters, and calculates the marginalized like- 
lihood, or evidence, for the model. This procedure is 
repeated for the competing hypotheses under considera- 
tion (e.g. instrument noise, instrument noise plus a black 
hole inspiral signal, instrument noise plus N galactic bi- 
nary signals) and the evidence ratios, or Bayes factors, 
are then used to identify the hypothesis that is most 
consistent with our prior belief and the current data. 
The Bayes factors computed from the PTMCMC algo- 
rithm are checked against those computed using a Re- 
verse Jump Markov Chain Monte Carlo (RJMCMC) al- 
gorithm, and are found to be in very good agreement. 

Several solutions to components of the GW detection 
problem have recently been successfully demonstrated us- 
ing methods similar to our own. Parallel tempering has 
been shown to be effective in the characterization phase 
of ground-based data when applied to simulated signals 
from spinning black hole binary inspirals [7]. A model 
selection study of when an non-spinning inspiral wave- 
form injected into simulated LIGO data has a sufficient 
signal-to-noise ratio to be favored over a model contain- 
ing only instrument noise has been performed using the 
Nested Sampling algorithm [8, 9]. 

The paper is organized as follows: In §11 we describe 
the detection problem and how it manifests differently 
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for ground- and space-based interferometers. In §111 we 
briefly describe the Bayesian approach to gravitational 
wave data analysis and show how it can be used to char- 
acterize a candidate detection (parameter estimation) 
and to evaluate the detection confidence (model selec- 
tion). In §IV we set up an example problem of a sin- 
gle galactic binary signal injected into simulated LISA 
data and describe the parameterization of the models 
we wish to distinguish. In §V we explain the details 
of the search and characterization phase of the analy- 
sis. We briefly introduce MCMC techniques followed by 
an in-depth description of our specific implementation, 
including the prior and proposal distributions used in 
the MCMC algorithm and the use of parallel temper- 
ing to thoroughly sample from the posterior distribution 
function for the model in question (§V A) allowing us 
to simultaneously complete the search and characteriza- 
tion. The evaluation of the candidate detection follows 
in §VI where we show how to select between the com- 
peting models via thermodynamic integration (§VIA) 
across the PTMCMC chains. In §VII we introduce an 
additional method for calculating the evidence ratio, the 
RJMCMC algorithm (§VIIA). In §VIII we present the 
results of our study, including a demonstration of the 
search efficiency, a determination of the signal-to-noisc 
ratio at which galactic binaries become distinguishable 
from instrument noise, and consistency checks applied to 
the computation of the Bayes factors. In §IX we dis- 
cuss the challenges associated with implementing these 
techniques, as well as the innovations which allowed us 
to achieve the desired reliability. We then outline the 
simplifications that went into this work and what impact 
they will have when included in more general detection 
problems. 



II. THE DETECTION PROBLEM 
A. Ground Based Detectors 

For ground based GW detectors the detection prob- 
lem exists because strong GW signals arc rare, while 
large-amplitude transient noise events in the detectors 
are common. One must then be able to distinguish be- 
tween these noise "glitches" and GWs. The majority of 
glitches can be excluded by looking at auxiliary instru- 
ment monitoring channels (data quality tests and vetoes 
in LIGO parlance), and by applying coincidence tests 
among the world-wide network of detectors. 

As early as 1989 it was suggested by Davis [22] that 
the gravitational wave detection problem may best be 
addressed in a Bayesian framework. Indeed, many of the 
seminal papers [10-15] on gravitational wave data anal- 
ysis employ Bayesian probability theory. These analysis 
are generally of a hybrid type, where Bayesian reason- 
ing is used to motivate the form of a particular frcquen- 
tist statistic. This hybrid style of analysis is particularly 
evident in recent studies of stochastic backgrounds [16], 



unmodeled bursts [17], methods for setting upper lim- 
its [18], and targeted searches for continous signals [19]. 
Standard choices of detection statistic include signal-to- 
noise or likelihood ratios, and measures of correlation 
across the network. Candidate events are identified us- 
ing these statistics, and the significance of a given event 
is established from Monte Carlo studies of signal injec- 
tions and time slides of the data. The hypotheses that 
are being compared, and the assumptions that are being 
made, are not always clearly spelled out in this approach, 
and in some cases may even be unphysical [17, 19]. 

There are several reasons why the Bayesian approach 
was not immediately adopted in practice, chief among 
them being the computational challenge of evaluating 
the high dimensional integrals that appear in the analy- 
sis, and the lack of a reliable model for the instrument 
noise. In recent years the development of efficient compu- 
tational techniques such as Markov Chain Monte Carlo 
methods (see e.g. Ref. [20]) and Nested Sampling [21], 
coupled with the widespread availability of cheap, high 
speed compuational resources, have helped to address the 
computational challenge, but the problem of modeling 
the instrument noise remains. 

The Bayesian approach to model selection forces us 
to defined explicit models for the hypotheses under con- 
sideration. Once the models are defined the analysis is 
entirely mechanical. The end result is an odds ratio for 
one hypothesis over another. Wc will see the Bayesian 
approach does not yield a fixed signal-to-noise threshold 
at which a particular class of signal becomes detectable, 
but rather, the signal-to-noise ratio at which a particular 
instance of the signal in a given noise realization becomes 
detectable can vary substantially. A note of caution here 
is the analysis is not answering the question "is there 
a gravitational wave signal present in the data, or is it 
just instrument noise?", but rather, "is the data most 
consistent with our model of the the gravitational wave 
signal and instrument noise, or our model of the instru- 
ment noise alone?". In particular, if our model of the 
instrument noise is poor - such as not allowing for occa- 
sional glitches - then the odds may favor the detection 
hypothesis even when no signal is present in the data. 
The goal is to find models for the signals and instrument 
noise [16] that arc sufficiently realistic so that the hy- 
potheses that are being tested closely approximate the 
physical situations we hope to compare. Here we focus 
on the development of a particulalry powerful technique 
- Parallel Tempered Markov Chain Monte Carlo - to ad- 
dress the computational challenge of applying Bayesian 
Inference to gravitational wave data analysis, and defer 
the development of more realistic noise models to a forth- 
coming publication. 



B. Space Based Detectors 

Whereas LIGO grapples with finding rare signals in a 
sea of noise transients, the space-based detection prob- 
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lem offers a different set of challenges. The most readily 
available sources expected to exist in the LISA band- 
width are the millions of binary star systems comprised 
of stellar mass compact objects (mostly white dwarfs) 
within the galaxy [23]. Hundreds of thousands of such 
sources will have sufficient amplitude to measurably ex- 
cite the LISA detector [24, 25]. Binaries which have al- 
ready been detected electromagneticly can be used as ver- 
ification sources for LISA by which the performance of 
the mission can be evaluated. It should be possible to iso- 
late tens of thousands of galactic binary signals from the 
LISA data, with the remainder blending together to cre- 
ate a confusion background which is the dominant source 
of noise between 10~^ Hz and 3 x 10^^ Hz [26] . It is there- 
fore critical to mission success that the maximum number 
of these sources can be cleanly regressed from the data 
to allow access to the more exotic discoveries which will 
otherwise be obscured by the galactic foreground. 

Many methods for attacking this problem have been 
developed and demonstrated in increasingly more sophis- 
ticated tests supplied by the Mock LISA Data Challenge 
Task Force [27] . Because of the degree of overlap between 
the tens of thousands of resolvable binaries, a global fit is 
needed [28, 29]. Since the number of resolvable sources is 
a priori unknown, the analysis must be able to find the 
best fit model along with the best fit parameters for a 
given model. Models with more parameters will generally 
have higher likelihoods, but not necessarily higher evi- 
dence, as their prior distributions are spread over a larger 
volume. Thus, a Bayesian approach to LISA data analy- 
sis automatically selects the most parsimonious model. 



III. BAYESIAN INFERENCE 

Bayesian probability theory is slowly making its way 
into the mainstream of astrophysics and astronomy [30- 
33]. In the Bayesian framework, the data d is used to 
update our prior belief p(Ti.) in hypothesis Ti. according to 
the (marginalized) likelihood p((i|7i) that our hypothesis 
could have generated d under the performed experiment. 
The posterior belief in the hypothesis is given by Baycs' 
theorem: 



pin\d) 



p{d\H)p{H) 

p{d) 



(1) 



For now p{d) is an unimportant normalization constant. 
Unfamiliarity with Bayesian techniques can often result 
in discomfort with the way prior belief in the hypoth- 
esis under scrutiny impacts the result of the analysis. 
This is actually a feature of Bayesian analysis as it re- 
quires the user to disclose the assumptions made about 
the model being tested. Searle et al [17] give an interest- 
ing discourse on the hidden assumptions made by some 
commonly used GW detection statistics. 

In model selection applications, the hypotheses are the 
competing models M and the posterior p{M\d) is the 



probability (density) that M. is the appropriate descrip- 
tion of the data. In practice this requires prior knowledge 
of all possible models testable by our experiment, which 
is impractical. Instead we must ask our data more care- 
fully constructed questions, such as "how do two specific 
models compare?" Since the normalization in cq. 1 is the 
same for all models ratios of the posterior odds eliminate 
the need to know all hypotheses a priori and yields a use- 
ful quantitative measure of confidence, the odds ratio, for 
one model over another: 
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p{Mi\d) p{Mi)p{d\Mi 



p{Mo\d) piMo)pid\Ma) 



(2) 



OiQ is the odds ratio for models A4i and A4o- An odds 
ratio of > 3 favors A4i [34]. Evaluating the Bayes factor 
(also known as the evidence- or marginal likelihood-ratio) 



Bio = 



_ pjdlMi) 
p{d\Mo) 



(3) 



is the goal of the evaluation phase of the analysis. Bio 
is interpreted as the degree of confidence in the appro- 
priateness of one model versus another in describing the 
available data. The prior odds Vio = p{Mi)/p{Mo) 
reflect any preference between models before the exper- 
iment is conducted. If we are to adopt one model over 
the other, the Bayes factor must overwhelm the prior 
odds, thus signaling the data is sufficiently informative 
to distinguish between the competing hypotheses. 

For the current generation LIGO detection problem 
Vio would likely down-weight the chances of a resolv- 
able GW signal based on some combination of reason- 
able event rates and detector behavior around the time 
of the candidate detection. This is analogous to setting 
a relatively high threshold which must be exceeded for a 
candidate detection to warrant further scrutiny. For the 
LISA extension of this same problem (due to the large 
number of anticipated detections) the prior odds would 
likely be far less restrictive, leaving the analysis more 
open to the possibility of a signal being present. 

Advocates of Bayesian inference cite a natural parsi- 
mony in the model selection application akin to Occam's 
Razor. The claim is that if two models give similar 
quality "fits" to the data the lower dimensional model 
will be preferred. This line of reasoning is only valid 
when accompanied by a strict qualifying statement: Ex- 
cess model parameters must be constrained by the data. 
The marginalized posterior distribution function of an 
unconstrained parameter will match the prior distribu- 
tion, meaning the data was incapable of updating our 
understanding of that parameter and can not be used to 
answer questions about whether or not that quantity "be- 
longs" in the model (See Ref. [35] for an excellent discus- 
sion of this and other misconceptions regarding Bayesian 
inference). That model then incurs no penalty for having 
unconstrained variables included in its parameterization. 
If the remaining parameters do as well describing the 
data as some other, lower dimensional model, both are 
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equally viable. Reverend Bayes brandishes his razor only 
if a particular model requires more parameters to achieve 
a similar fit. 

The "penalty" in the Bayes factor is roughly equal to 
the ratio between the widths of the posterior and the 
prior distributions of the additional parameters. If this 
is near unity than there is no penalty for the additional 
dimensions and the two models are indistinguishable. 
Tightly constrained model parameters will incur a large 
Occam factor and thus must substantially improve the 
likelihood in order for that model to be preferred. This 
is a somewhat subtle and often overlooked point that, 
when addressed, enhances the conceptual understanding 
of Bayesian model selection. 

For parameter estimation applications, the hypothe- 
sis states that the acquired data is the result of some 
process parameterized by 9. The posterior distribution 
fimction p{0\d, A4) is the probability density function for 
the model parameters over the parameter space. Pro- 
ducing a well sampled PDF is the goal of the search and 
characterization phase of the analysis. The global maxi- 
mum of the PDF, or maximum a posteriori (MAP) point, 
occurs where the model parameters are the "best fit" val- 
ues for the hypothesis in question (weighted by the prior 
belief in those parameter values). The distribution about 
this location in parameter space reveals how well the data 
constrains the hypothesis. Using Bayes' theorem we have 



p{e\d,M) 



p{d\e,M)p{9,M) 
p{d\M) 



(4) 



In this instance p{9,A4) is the prior distribution of the 
model parameters, and p{d\d,M) is the likelihood that 
the model could have generated data d for model param- 
eters 9. The normalization constant 



pid\M) = / d9 p{9,M)pid\e,M) 



(5) 



is the marginalized likelihood, or evidence, for the model. 
In practice it quickly becomes too costly to perform a di- 
rect evaluation of the posterior distribution function. In 
typical GW data analysis applications the number of pa- 
rameters needed to describe the signal model and noise 
model can be very large (ranging from tens to hundreds 
of thousands), and the regions of significant posterior 
weight generally occupy a minute fraction of the total 
prior volume. Resolving the peaks in the posterior dis- 
tribution function requires a very fine sampling of the 
parameter space, but if this sampling is extended over 
the entire prior volume the total number of samples can 
become astronomically large. 

The Markov Chain Monte Carlo approach encompasses 
a powerful set of techniques for producing samples from 
the posterior distribution. These algorithms focus their 
attention on regions of high posterior weight, and neatly 
avoid the problem of computing model evidence. The 
later feature becomes less desirable when it comes time to 
test competing hypotheses. One solution is to generalize 



the transitions between steps in the Markov Chain to in- 
clude "trans-dimensional" moves between different mod- 
els, resulting in what are termed Reverse Jump Markov 
Chain Monte Carlo (RJMCMC) algorithms. The num- 
ber of iterations that the chain spends in models A4 1 and 
A4q provides an estimate for the evidence ratio Biq. A 
more complete description of this approach can be found 
in §VIIA. 

In what follows we will focus on a comprehensive ap- 
proach to Bayesian data analysis that combines the Par- 
allel Tempered Markov Chain Monte Carlo algorithm for 
conducting the search and performing the characteriza- 
tion with thermodynamic integration across the chains 
providing a robust estimate for the model evidence. 



IV. TOY PROBLEM 

To test our analysis scheme we want to use a toy 
problem that allows for fast computation of likelihoods, 
while still being fairly realistic. To that end we gener- 
ate a 256 frequency-bin segment of simulated LISA data 
which contains instrument noise and the signal from a 
single chirping galactic binary over an observation time 
of Tobs = 2 years. 

Ideal GW data s has two independent contributions: 
The incident gravitational radiation convolved with the 
transfer function of the detector generates some response 
h in addition to noise n caused by some collection of 
random processes within the instrument. This can be 
written in the Fourier domain as 



(6) 



where the subscript a indicates the output specific to a 
single detector (in the case of the ground-based effort) 
or a particular interferometer channel of a single detec- 
tor (as for LISA). The LISA detector will be an array of 
three satellites, each housing a pair of test masses and 
the appropriate optical system to precisely monitor the 
distance between itself, the test masses, and the other 
two satellites. The relative distance measures between 
each satellite will be combined to form three coupled 
Michelson-type interferometer channels. We will work 
in the noise-orthogonal time-delay interferometry basis 
which is formed out of linear combinations of the Michel- 
son channels, commonly referred to as channels A, E, and 
T [36] . This choice has the added benefit of reducing to 
only two data channels of interest, as T is free of GW 
signals within the expected frequency range of galactic 
binaries. 

We first generate a noise realization for the frequency 
range of interest by drawing from the theoretical LISA 
noise spectral density (see §IVB). We then inject a 
test source somewhere in the middle third of the data 
with some tunable amplitude so that we may establish 
at what signal to noise ratio (SNR) the source becomes 
detectable. To be considered effective our algorithm must 
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find the MAP parameters for two distinct models 
M, : S{f) = n{f) 

Ml : S{f) = n{f) + h{f), (7) 

thoroughly resolve each models PDF. and calculate the 
evidence for each model (up to some constant factor com- 
mon between A4i and A^o)- We will consider the prior 
odds between the two models to be unity, thus allowing 
us to interpret the evidence ratio (Bayes factor) as the 
odds ratio. 



A. Modeling the Source: Galactic Compact 
Binaries 

The choice of using galactic binaries as the test source 
was motivated by the fact that waveforms for these ob- 
jects can be rapidly modeled using the fast-slow decom- 
position described in [37] . Despite the simplicity of gener- 
ating these waveforms, they posses sufficient complexity 
that we do not anticipate significant additional complica- 
tions related to the modeling/fitting of other, more exotic 
sources (the same techniques described here, with a few 
enhancements, are proving sucessful at handling spinning 
black holes and extreme mass ratio inspirals). 

Detached, mildly chirping compact binaries are well 
modeled, and can be parameterized using eight quantities 
which we will consider as components of a parameter 
vector A: 

(lnA,/robs,/T;2^s-cos6',(/),V,cosi,(po) (8) 

The physical parameters are the amplitude A, GW fre- 
quency / (twice the binary's orbital frequency), fre- 
quency derivative / = df/dt, co-latitude and azimuthal 
sky angles 9 and (p, GW polarization angle tp, orbital in- 
clination i, and phase ipo of the gravitational waves. The 
quantities /, / and a-re defined at some fiducial time 
such as when the first data is taken. 



B. Modeling the Instrument Noise 

The contribution to the data from instrument noise in 
each Fourier bin, n(/), has spectral density Sn{f) with 
mean and variance 

("(/).) = 
{nifhHfW = ^S.,SM) (9) 

respectively. For operational detectors this is approxi- 
mately true but excursions from the expected level are to 
be anticipated. In preparation for this reality during the 
LISA mission we developed our algorithm to simultane- 
ously fit for the noise level in the data. We parameterize 
the noise by assuming that departures from Sn{f) can 



be characterized by a rescaling of the noise spectral den- 
sity. This rescaling is unique to each channel (A and E) 
while Sn{f) is identical for both. A more sophisticated 
approach would be to fit for the twenty-four components 
of noise in the LISA constellation (the laser phase noise 
between, and the position noise of, each test mass). This 
approach has been developed by Adams and Cornish [38] , 
but has not been implemented here. 

The simulated segment of LISA data is divided into 
four equal length sub-regions in frequency space of size 
A/. Since the noise spectral density is reasonably con- 
stant over the entire data segment the central frequency is 
used to evaluate Sn{f) which will furthermore be treated 
as constant, while the noise level in each sub-region i and 
interferometer channel a is rescaled by rjl^ to yield the 
modeled noise level 

S^if) - V?Sn. (10) 

For the case where there are two non-negligable interfer- 
ometer channels (A and E) a total of eight noise param- 
eters are required. An example of these parameters for a 
particular noise realization can be seen in Fig. 1. 

We have now fully parameterized two competing mod- 
els 

Mo-.do = Y1 

j 

Mi-Ji = X + Y,Vj (11) 

j 

which can be used to explain the data in question. Our 
attention will subsequently turn to applying Bayesian 
tools to establish the most appropriate configuration of 
each model given the simulated data, and to learn which 
model most appropriately describes the data in question. 



V. IMPLEMENTATION OF BAYESIAN 
TECHNIQUES: DETECTION & 
CHARACTERIZATION 

A. Markov Chain Monte Carlo 

Markov Chain Monte Carlo (MCMC) [20, 39, 40] tech- 
niques are becoming quite familiar in the GW commu- 
nity [41] thanks to their great effectiveness in solving 
parameter estimation problems. Variants of these tech- 
niques have also proven quite effective as search algo- 
rithms [28, 29, 42-44]. Here a brief description of the 
MCMC technique is followed by a more in-depth discus- 
sion of the particular implementation that we used. 

The utility of the MCMC approach stems from its abil- 
ity to explore the parameter space in such a way that the 
resulting chain samples directly from the target poste- 
rior distribution function. This is accomplished by first 
adopting some position in parameter space x as the first 
"link" in the chain. A subsequent move to a new position 
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A channel noise fit 




2.9995 3 
E channel noise fit 




FIG. 1: MAP noise parameters for A4q (red, solid) and A4i 
(blue, dashed). The data consists of two interferometer chan- 
nels, each containing 256 Fourier bins which are divided into 
four sub-regions. Each sub-division is fitted with a unique 
noise parameter. The signal is injected somewhere in the 
second quarter of the data. Model A^o elevates the noise 
parameter in the second window to account for the excess 
power caused by the gravitational wave signal. Model Aii 
successfully fits to that gravitational wave leaving the noise 
parameter closer to unity. Notice how the noise parameters 
for remaining portions of the data are nearly identical between 
the two models. 



y is proposed using some (normalized) distribution q{y\x) 
read "the probability of moving to y given that the cur- 
rent location is x". The new likelihood p{s\y) and prior 
probabihty p{y) are evaluated and the new position is 
accepted with some probability k = min[l,_ff] where H 
is the Hastings ratio 



pis\y)piy)q{x\y) 

""^^ p{s\x)p{x)q{y\x) 



(12) 



The Hastings ratio is designed to ensure detailed balance, 
and hence reversibility, between subsequent steps in the 
chain. This ensures free exploration of the PDF with- 
out any bias or hysteresis. The process of stochastically 
stepping through parameter space is repeated until some 
convergence criteria is satisfied. Afterwards, the number 
of iterations spent in a particular region of parameter 
space, normalized by the total number of steps in the 
chain, yields the probability that the source parameters 
take values in that region. 

Marginalized posterior distribution functions can then 
be inferred by constructing histograms of the parame- 
ter values (discarding early iterations from the "burn in" 
phase before the chain reaches stationarity). A com- 
mon practice is to choose a proposal distribution that 
is symmetric, thus eliminating the need to include it in 
the calculation of H. The choice of q{y\x), by construc- 
tion, can not alter the recovered posterior distribution 



function. The proposal distribution does, however, dra- 
matically affect the acceptance rate of trial locations in 
parameter space and, therefore, the number of iterations 
required to satisfactorily sample the joint PDF. 



1. Evaluating the Likelihood 

The standard likelihood definition used in matched 
filtering searches with ideal (stationary and Gaussian) 
noise is 



3(s|A) = Ce- 



(13) 



where 



h{X)\s — h(X)) and the normalization 



constant C is independent of the source parameters [45] . 
It does however depend on the noise parameters. The 
brackets (-j-) denote a noise weighted inner product 
which, for a finite observation time Tobs, is defined as 



obs 



f 



(14) 



with the sum over a indicating the presence of multiple 
interferometer channels contributing to the data. These 
can be from different detectors {e.g. LIGO HI, H2, and 
LI) or different combinations of channels from a single 
interferometer (LISA) . 

In our implementation the noise spectral density is not 
fixed, and care must be taken to account for this when 
computing the likelihood. In particular, the normaliza- 
tion of the likelihood depends on the noise parameters. 
By elevating the non-constant components of C into the 
argument of the exponential the likelihood is now written 



p(s| A, ff) = C' exp 



iV^ln) 



(15) 



a, 3 



where N = Tobs A/ is the number of Fourier bins over 
range A/ and the sum over j ranges over the number of 
A/ segments in the data. Meanwhile our definition of the 
noise weighted inner product (eq. 14) must also rcffect 
the parameterization of the noise by replacing S"" (/) with 



2. Prior Distributions 

One of the selling points for gravitational wave astron- 
omy is that it will open a window on astrophysical sources 
where relatively little is currently known. Consequently, 
the prior distributions for the parameters in our mod- 
els will tend to be fairly wide. For the problem at hand 
there are population synthesis models that combine our 
best understanding of the physical processes that govern 
stellar evolution and galactic structure, but a paucity of 
observational data, especially for short period systems. 
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mean that there are considerable uncertainties. One 
could combine the available models to produce predic- 
tions for the joint probability of systems having a certain 
period, sky location, distance from the Earth, masses, 
and mass transfer rate, then translate these into a joint 
prior for A, /, /, 6, and (f>. The orientation of the orbit is 
expected to be random, as is the initial orbital phase, so 
the parameters ip^ cos t, and ipo should be uniform across 
their allowed range. These astrophysical priors may be 
supplemented by considerations of the observational bi- 
ases, such as the inability to detect sources below a cer- 
tain amplitude threshold. 

For our toy problem we simplified the analysis by 
adopting uniform priors for In A, f, cos 9, and (j). We 
did however use a particular population synthesis model 
to provide a prior for the frequency derivative, after 
marginalizing over the other model parameters. For a 
detailed description of how this prior is constructed, and 
its effect on parameter estimation, see Ref. [37]. Save for 
the amplitude, all of the parameters in our model have 
a natural range, so it is easy to construct properly nor- 
malized prior distributions. The choice of range for the 
amplitude requires more thought. 

For the amplitude we used a very conservative lower 
limit for what might be detectable, and an upper limit 
that is much larger than any of the injections used in 
the study. A rough estimate for the signal-to-noisc ratio 
(SNR) of a source with amplitude A can be derived by 
assuming that all the signal power is deposited in a single 
Fourier bin: 



SNRc 



AV%- 



bs 



(16) 



where Smif) is the noise spectral density for a Michelson- 
like interferometer channel, which is related to the noise 
in the A, E TDI channels by 



Snif) 



4 sin^ (///*) 



(17) 



Unless otherwise stated, we used a uniform prior range 
on In A such that estimated signal-to-noise ratio lay in 
the range SNRest G [1;20]. The lower end of this range 
is certainly not detectable, so we expect to find that the 
model selection procedure will be unable to strongly fa- 
vor the noise model over the signal-|-noise model, as the 
signal model extends to what is effectively zero ampli- 
tude. To explore the impact of the lower bound we will 
also consider the effect of setting an amplitude thresh- 
old corresponding to SNRost = 5. Put another way, the 
non-detections of gravitational waves by ground based 
instruments does not rule out the existence of gravita- 
tional waves, but rather, sets limits on the event rate 
and strength of GW signals. 

It is expected that the results of the analysis will vary 
depending on the choice of prior distributions, but hav- 
ing uninformative priors also yields a significantly more 
difficult problem to solve. The fraction of the prior vol- 
ume which contains the majority of the support for the 



posterior distribution function governs the difficulty of 
the analysis. For galactic binaries, which have relatively 
small dimension and simple waveforms when compared to 
more exotic LISA sources (such as extreme mass-ratio in- 
spirals and spinning binary black holes) we find that, for 
marginally detectable signals in our toy problem, ~ 90% 
of the posterior mass is contained within ~ 10^^^ of the 
prior volume. That is what makes the search for gravita- 
tional wave signals so difficult. The ratio is many orders 
of magnitude smaller for more complex signals. An ad- 
ditional complication is that the parameter volume that 
contains ^ 90% of the posterior mass is typically com- 
prised of multiple disjoint regions, and the analysis tools 
must be able to explore all of these posterior modes. 



3. Proposal Distributions 

The scheme for proposing trial positions in the param- 
eter space has no bearing on the ultimate results of any 
MCMC analysis but it greatly affects the amount of time 
required to arrive at that solution. Convergence time is 
at a minimum if the proposal distribution matches the 
posterior distribution. Although we can not expect to 
know the "answer" (the PDF) until after we've done 
the analysis, physical considerations can help in the de- 
sign of proposal distributions that approximate the pos- 
terior. The noise- model parameters are independent of 
one another meaning each should have a PDF described 
by a chi-squared distribution with N degrees of freedom 
(N is the number of samples in the data window where 
the noise parameter is applied, see §VA1). For a suffi- 
ciently large number of frequency bins per window the 
chi-squared distribution can be approximated as a nor- 
mal distribution with variance = 1/N . New noise 
parameters can then be proposed by drawing from this 
distribution via: 



Vx 



■iV[0,l] 



(18) 



where A^[0, 1] is a normal distribution with zero mean 
and unit variance, D,, is the noise-model dimension and 
is used to rescalc the jump to rjy. This ensures the total 
noise parameter jump to be typically one standard devia- 
tion from the mean of the presumed joint noise posterior 
distribution function. 

The signal-model parameters arc known to have non- 
zero correlation so efficient proposal distributions should 
be constructed using the inverse covariance matrix of the 
parameter space. This is commonly approximated by the 
Fisher Information Matrix (F) with components 



r, 



(19) 



Here h^i denotes a partial derivative of the template h{X) 
with respect to the i^^ parameter [46] . Jumps in param- 
eter space are then made in the eigen-directions of the 
Fisher by drawing from iV[0, 1] and scaling it by the as- 
sociated eigen-value. These jumps are then additionally 
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rescaled by l/^/D\, where Dx is the signal-model di- 
mension to again achieve typical jumps of one standard 
deviation. Instead of re-evaluating the matrix at every 
step of the chain, we keep the same Fisher for many it- 
erations (making for a symmetric proposal distribution), 
under the assumption that it does not vary greatly in the 
vicinity of a local maximum. Although this type of pro- 
posal distribution very efficiently samples the posterior in 
the vicinity of the signal parameters (for sufficiently high 
SNR) it is a poor choice for sampling the entire posterior, 
as large jumps are very unlikely to occur. 

Since the Fisher Information Matrix is a local approx- 
imation to the inverse covariance matrix, it can not be 
used to predict the parameter correlations far from the 
current location. For larger jumps we use a simpler pro- 
posal that takes the estimated variance for each param- 
eter (as determined from F) and use this to scale ad- 
ditional draws from iV[0, 1] to jump in parameter direc- 
tions. This proposal distribution cocktail is then rounded 
out by occasionally proposing uniform draws on the prior, 
ensuring that trial samples cover the entire prior volume 
during the exploration of the target distribution. This 
collection of proposal distributions have the added bene- 
fit of being symmetric in 9^ and 9y, allowing us to neglect 
their contribution to the Hastings ratio. 

4- Parallel Tempering 

Although our proposal distributions permit explo- 
ration of the entire prior volume, the bolder jumps are 
rarely accepted as the chance of making large changes to 
a large number of parameters and still ending up at a 
location with decent likelihood is vanishingly small. Fur- 
thermore, it is our desire that this Markovian chain can 
also be used to find the injected signal even if it is started 
randomly within the prior volume. A straight forward 
implementation of the MCMC algorithm will never (prac- 
tically) find the MAP for parameter spaces of this com- 
plexity without some assistance. Usually this assistance 
comes in a way that violates the detailed balance condi- 
tion between subsequent iterations and thus nullifies the 
Markovian nature of the chain (resulting in biased sam- 
ples which do not mirror the PDF). This is of no negative 
consequence as long as these non-Markovian steps in the 
chain are discarded as burn-in samples, but it does put 
tremendous pressure on the burn-in to put the chain very 
near to the global maximum. 

To encourage efficient global sampling of the target 
distribution, which in turn expedites the convergence 
time of the chain, while preserving detailed balance we 
have adopted the powerful technique of parallel temper- 
ing [47], where multiple chains explore the data simulta- 
neously, each sampling from an iteratively higher "tem- 
perature" target distribution 

p{9\sj3)^p{9)p{s\fff (20) 

where (3 is analogous to an inverse temperature. This 



effectively "softens" the likelihood surface allowing the 
chain easier access to positions in parameter space that 
would otherwise be difficult to reach. Information from 
"hot" chains is allowed to propagate towards colder 
chains (and vice versa) by randomly proposing param- 
eter exchanges between temperature levels. An exchange 
of parameters between the i'^ and j^^ chain satisfies de- 
tailed balance if the conditional probability is evaluated 
with Hastings ratio: 

p{s\9,,M.,P,)piM9,,M,,Pj) 

Only points in the /3 = 1 chain sample the true pos- 
terior and are therefore permitted to contribute to the 
Markov chain from which the PDF is inferred. However, 
by exchanging parameters with the hot chains the /3 = 1 
chain rapidly explores the full target distribution reveal- 
ing a wealth of information about the posterior. This is 
in contrast to chains without parallel tempering which 
are prone to "sticking" at or near the maximum of the 
posterior, or worse, some secondary maxima. Although 
non-parallel-tempered chains efficiently sample the re- 
gion around some maxima they can completely miss (for 
a finite run time) the more complex structure of the tar- 
get distribution. 

Although parallel tempering increases the computa- 
tional cost of each iteration, for well constructed heating 
schemes, Nc chains running for / iterations generically 
converges faster than one chain allowed to run for Nc x / 
iterations. Figure 2 clearly demonstrates the advantage 
of augmenting a typical MCMC with parallel tempering. 
The figure shows two dimensional marginalized posterior 
distribution functions as sampled by an MCMC as de- 
scribed above. The injected source was a single galactic 
binary with a SNR ~ 7. For demonstrative purposes the 
chains were initiated at the injected signal parameters 
and underwent 1.5 x 10^ iterations, the first 5 x 10^ be- 
ing discarded as burn in samples. The PDFs are for the 
In^ - /Tobs and cos 9 - cj) planes. The MCMC without 
parallel tempering clearly samples the region around the 
injected parameters but is confined to that location in 
parameter space and would need to run beyond a time 
that is practical in order to sample the full distribution. 
The parallel tempered MCMC samples the full PDF very 
efficiently, and for this example discovers a global max- 
imum that significantly differs from the injected param- 
eters signaling how the MAP parameters can be pushed 
away from the injected parameters by the noise. 

This particular example also demonstrates the harm 
in being solely interested in the "best fit" parameters, as 
opposed to the full distribution. Had a search been per- 
formed over this data (as opposed to starting the chain 
at the injected parameters) through some MCMC driven 
technique (such as using simulated annealing during the 
burn-in phase) without parallel tempering it is likely that 
only the global maximum will have been sampled by the 
post burn-in MCMC recovering signal parameters that 
are measurably different from the physical parameters of 
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the source which is producing the gravitational waves. 
This may be of Httle consequence for a single galactic bi- 
nary, but for more exotic signals where source parameters 
will be used to learn about important astrophysics, or 
where optical counterparts will be of great interest, it is 
clear that the full PDF is vital for complete understand- 
ing of the underlying astrophysics. Alternatively, a well 
constructed PTMCMC algorithm will simultaneously lo- 
cate maxima in the search space (satisfying the search 
phase of the detection problem) while accurately and ef- 
ficiently mapping out the posterior distribution function 
(characterizing the gravitational wave source). 



direct calculation (instead of an approximation ) to the 
evidence [50] . It is not necessary to thermodynamic inte- 
gration that the different temperature chains be allowed 
to exchange parameters, however allowing them to do so 
ensures healthy mixing and convergence. 

The evidence ratio between competing models A4i and 
A4o then yields the Bayes factor for the models in ques- 
tion. We now have a single algorithm that, if imple- 
mented effectively, solves all three facets of the detection 
problem free of any approximations which may render 
the result suspect. 



VI. IMPLEMENTATION OF BAYESIAN 
TECHNIQUES: EVALUATION 

A. Thermodynamic Integration 

Although only the /3 = 1 chain samples the target dis- 
tribution in the PTMCMC approach, the higher tem- 
perature chains serve as more than just a convergence 
aid. These additional chains can also be utilized to cal- 
culate the model evidence. For normalizable priors we 
can evaluate p{d\M) from eq. 5 by using the expectation 
value of the likelihood for each temperature level. First 
we consider the evidence for each temperature's posterior 
distribution function as part of some partition function 
Z{f3) where 

z(/3) = j de p{d\e,M,fi)p0M) 

= j de p{d\e,Mfp0,M). (22) 

Because the prior is independent of (3 we can write the 
partition function as 

-^\nZ{P)^{\np{d\e,M))p (23) 

where (lnp((i|6', A^)) p is the expectation value of the like- 
lihood for the chain with temperature 1//?. Then the 
evidence can be found by integrating over (3 via 

\np{d\M)= f dl3 {\np{d\e,M))p. (24) 

Alternatively, after the data has been analyzed under 
both models the Bayes factor is calculated by 

InSio = r f3i{h-^Pid\Oi,Mi))0 

J —OC 

^ {lnp{d\0o,Mo))[3) dlnp. (25) 

The expectation value of the likelihood is evaluated over 
the post-burn-in chain iterations [48, 49]. This tech- 
nique, known as thermodynamic integration (TI), is a 



VII. MODEL SELECTION CROSS VALIDATION 
A. Reverse Jump Markov Chain Monte Carlo 

Although TI is robust and converges predictably, we 
feel that for "high stakes" model selection problems (such 
as the first GW detection) several complimentary meth- 
ods should be used to support the detection model over 
any alternative hypotheses in order to establish addi- 
tional confidence in the detection. One approach for 
such a case is to use alternative techniques for calculat- 
ing the Bayes factor. For this additional checking stage 
we have chosen the "gold standard" of Bayes factor cal- 
culators: The reverse jump Markov chain Monte Carlo 
(RJMCMC) algorithm. 

This breed of MCMC was originally developed by P.J. 
Green in 1995 [51] and is unique in that it has the ability 
to transition between competing models, effectively mak- 
ing the model one of the search parameters. This tech- 
nique, given adequate mixing and convergence, has the 
advantage of directly sampling from p{A4\d) as opposed 
to approximating or calculating the model evidence. Like 
its fixed dimension counterpart, the RJMCMC is guar- 
anteed to converge to the appropriate PDF. The model 
posterior is determined by the number of iterations the 
chain spends in each model (normalized by the total num- 
ber of iterations) . With a small number of models present 
(as in our example) a more useful representation of the 
information garnered by the RJ algorithm is to calculate 
the Bayes factor for two competing models 

^ # of iterations in Mi 

-^10 = ^ r — -■ ■■ — rr- (26) 

# of iterations m Mo 

Allowing for the exploration of different models (which 
may differ in dimension) requires a separate Hastings step 
which proposes to move the chain from one model to an- 
other. Parameters for trial model Mi are drawn from 
q{9i,Mi). If the models are nested (perhaps proposing 
additional parameters to include in the existing set) all 
of the like parameters are held fixed while the new pa- 
rameters are drawn from q. In the detection problem 
case, where the models are disjoint, all model parame- 
ters must be randomly selected. Once the new model's 
parameters arc in hand the trans-model Hastings ratio 
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FIG. 2: Two dimensional marginalized log posterior densities for a single galactic binary with SNR = 7 as approximated 
by a MCMC. On the left is the In A — /Tobs plane, the right shows a Molweide projection of the sky location (parameters 8 
and (f)). Red (white) locations have the highest log probability density while blue (black) has the least. The top panel is for 
an MCMC without parallel tempering where the chain was started at the injected parameters and allowed 5 x 10^ burn-in 
iterations. The bottom panel is the same, only now with twenty parallel chains spaced geometrically in heat with a maximum 
temperature of 100 (/9max = 0.01). The PTMCMC samples the full distribution, resolving the global structure of the PDF, 
while the straight-forward single chain MCMC never leaves the region around injected parameter values, missing the global 
maximum entirely (which happens to be pushed off from the injected values by instrument noise) . 



(again, satisfying detailed balance) is calculated by 

where the Jacobian | | accounts for any change in di- 
mension between models Mi and Mj which are param- 
eterized by 9i and 0j respectively. Selecting an efficient 
proposal distribution for model transitions is typically 
the major obstacle in the implementation of an efficient 
RJ routine. If the proposal distributions yield the model 
parameters directly, instead of a set of random numbers 
which are then used to determine the new model pa- 
rameters, the Jacobian is unity and can be neglected. 
RJMCMC algorithms are notoriously difficult to imple- 
ment because dimension changing moves are typically 
accepted with very low frequency causing the chains to 
converge slowly. This problem is exacerbated when com- 
peting models are of high dimensionality. It is therefore 



vital to the success of the algorithm to construct trans- 
model proposals which allow for decent mixing between 
the two models. Without adequate between-model tran- 
sitions the inferred Bayes factors are suspect. 



1. Transdimensional Proposal Distribution 

It took us many attempts to get the RJMCMC 
routine to work. Our initial efforts used the Fisher 
approximation to the posterior in the neighborhood 
of the MAP parameters found from the PTMCMC 
search/characterization. This technique showed initial 
promise by producing qualitatively reasonable Baycs fac- 
tors (monotonically increasing as a function of SNR) and 
stability (producing similar results through different ran- 
dom seeds). However, further testing revealed disagree- 
ment with the thermodynamic integration and erratic be- 
havior including large increases in Baycs factor for very 
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FIG. 3: Average likelihood for chains of different heats /? = 
1/T. The red (solid) line is for the noise only model Aio while 
the blue (dashed) line is for the signal plus noise model Mi. 
This particular example was for data containing a SNR — 8 



small increases in SNR, and occasionally questionable re- 
sults (such as "detectable" sources with SNR well below 
five). This was confounding until we noticed that the 
RJMCMC chains were producing PDFs that differed sig- 
nificantly from the PTMCMC results. This signaled poor 
exploration of the pror and a lack of convergence towards 
the true distribution. It wasn't until the Fisher approx- 
imation became sufficiently accurate (the SNR became 
sufficiently large) that this proposal distribution mim- 
icked the target distribution well enough to allow for ad- 
equate mixing. It was difficult to immediately diagnose, 
based on the behavior of a RJ chain alone, what the prob- 
lem was, and without having the thermodynamic integra- 
tion results for comparison, it would have been difficult 
to know if alternative approaches fixed the problem. 

We settled on a method for constructing trans-model 
proposals first suggested by Green [52] to start with fixed 
dimension MCMC trial explorations of each model in 
question. The resultant joint PDF from this learning 
period can be used as the proposal distribution used 
to move into that model. The implementation of this 
scheme which would yield the most efficient between- 
model exploration would be to sample from the fixed- 
dimension joint PDF directly. Unfortunately, thoroughly 
sampling the PDF for high dimension models rapidly be- 
comes prohibitively costly. Instead, we approximate the 
fuU PDF using the chain from the PTMCMC analysis 
used in the initial detection. 

For noise parameters this is simply a normal distribu- 
tion for each frequency window and channel with mean 

MAP ^^'^ variance l/N. The mean of the noise- 
parameter proposal distribution is the maximum a pos- 
teriori value for each parameter found from the fixed- 
dimension PTMCMC. This approach requires Mi and 
A^o to have slightly different trans-model proposal dis- 



tributions for noise parameters as the analysis done under 
different models will yield different MAP noise parame- 
ters. 

Sufficiently sampling the full signal posterior would re- 
quire 10^" — 10^^ samples, whereas our typical Markov 
chains are only of length 10^ samples. Clearly the full 
PDF will be woefully under-sampled, but thanks to the 
excellent mixing from the fixed dimension exploration 
(courtesy of the parallel tempering) we are confident that 
the high probability density regions of the PDF have 
been adequately sampled and qualitatively reflect the 
true joint distribution. To construct the signal parameter 
proposal distribution q{X^Mi) we first divide the prior 
volume for each parameter A* by the average standard 
deviation tip of that parameter. This average is taken 
over the parameter variances calculated each time the 
Fisher Information Matrix is updated during the PTM- 
CMC analysis. 

The prior volume can now be thought of as (for the 
galactic binary parameterization) an 8-dimensional hy- 
pcrcube with cells of width (jp. The volume of each cell 
is dV = Yii o'r ^^"^ the total number of cells in the hyper- 
cube is A'^ceii = rij '^P{^^)/<^r where Ap(A') is the prior 
range for parameter i. We then sort through the fixed 
dimensional signal-model Markov chain from the PTM- 
CMC to populate the cells in the hypcrcube. Typically 
between 10'^ to 10^ cells are actually occupied for fixed- 
dimension chains of length ./Vmcmc = 10^. If n points 
in the chain land in the cell with reference parameters A 
then the probability density of jumping into that cell is 



qsiX,Mi 



n{X) 



dl/ -/Vmcmc 



(28) 



This is not yet a valid proposal distribution, however, 
because drawing from (?s(A, A^i) will not allow the signal 
model transitions to access the entire prior volume. To 
rectify this we add to the existing distribution a uniform 
distribution over the entire cube, by giving each cell a 
single occupant. The uniform piece of the proposal has 
probability density 



quiMi) 



1 



dVNccn 



(29) 



The full proposal distribution is then the sum of the sam- 
pled and uniform cell populations, with relative weighting 
w 



q{X,Mi) = wqs + (1 - w)qu 



(30) 



To ensure proper normalization of the proposal distribu- 
tion w G [0, 1]. 

To draw from this hybrid distribution we first com- 
pare a number from U[0, 1] to w. If this exceeds w 
we uniformly draw the signal parameters from the prior. 
We then must determine to which cell of the hypercube 
the new set of parameters belongs so we may evaluate 
q{X, Ml). When drawing from the PTMCMC-populated 
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portion of the distribution we randomly select one "link" 
from the fixed dimension chain to select an occupied cell 
of the hypercube. This guarantees that moves to a par- 
ticular occupied cell are proposed at the same frequency 
with which the cell was sampled during the search. We 
then draw the signal parameters uniformly from within 
the selected cell. For transitions into M.^ wc must again 
determine where in the cube the current signal parame- 
ters reside to accurately calculate the trans-dimensional 
Hastings Ratio. 



2. RJ Parallel Tempering 

Between trans-dimensional proposals the chains up- 
date within their current dimension identically to the de- 
scription described in §VA. We only allow the (3 = 1 
chain to undergo trans-dimensional moves. When this 
chain is in a given model it has Nc ~ 20 chains of the 
same model, each of increasing temperature, with which 
to exchange parameters. When trans-dimensional moves 
are accepted by the "cold" chain these parallel tempered 
chains stop updating and store their current locations in 
parameter space, while the analogous chains in the other 
model resume from their most recent positions. This way 
the chain sampling from the target distribution is always 
in contact with a "heat reservoir" of parallel chains, but 
we can avoid wasting computations for chains which have 
no way of communicating with the /3 = 1 parameters. 
The schedule for these different types of updates (inter- 
dimensional MCMC, parallel tempering parameter ex- 
changes, and trans-dimensional proposals) is part of the 
overall proposal distribution and therefore does not af- 
fect the outcome. This schedule docs, however, need to 
be tuned in order to see convergence within a reasonable 
amount of time. 



VIII. RESULTS 

Three test sources, chosen to provide representaitve 
examples over the sky and frequency range of compact 
binaries, were constructed and injected into several dif- 
ferent noise realizations per signal. 



TABLE I: Injected galactic binary parameters. 



signal-to-noise ratios defined by 



i2 

obs 



Source / (mHz) cos 6 (f> cos i ip ipo fT^ 

1 1 0.713 0.452 0.534 2.334 0.624 3.241 

2 3 0.598 2.972 0.827 1.240 5.938 0.139 

3 5 0.326 4.644 0.133 0.314 3.878 0.643 



SNR; 



{h{\)\h{\) 



(31) 



We are then able to learn how the Bayes factor increases 
as a function of SNR to infer when that particular source, 
in that particular noise realization, becomes "detectable" 
(^10 <^ 3). The data was analyzed using several runs of 
the fixed dimension PTMCMC (one for each model) with 
different seeds used to initiate the chains, thus testing the 
stability of the evidence calculation. We used 20 chains 
in the parallel tempering scheme with a maximum tem- 
perature of 100. The temperature spacing of the chains 
was geometric in /3. 



A. Search Phase 

The chains were initialized randomly within the pa- 
rameter space and allowed 5 x 10^ iterations of "burn- 
in" before the samples were considered to be accurately 
representing the target distribution. As can be seen from 
an example chain in Figure 4 the (3=1 chain reached a 
stationary solution in the vicinity of the maximum log- 
likelihood within the burn-in phase (the red [solid] por- 
tion of the plot) thus signaling the arrival of the search to 
the region(s) of maximum posterior weight. This demon- 
strates the success of the search phase in a reasonable 
number of iterations without non-Markovian convergence 
aids. 




iterations x 1 e6 



FIG. 4: \np{s\6) during search (red, solid) and characteri- 
zation (blue, dashed) phase of the analysis for a SNR = 8 
source. 



Characterization 



For each case the analysis was repeated for a range 
of amplitudes, corresponding to signals with a range of 



The 10^ iterations post burn-in were used to charac- 
terize the candidate detection. The parallel tempering 
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allowed the chains to thoroughly explore the entire pos- 
terior distribution function within this alloted time. We 
were surprised to see a great deal of structure in the sig- 
nal posterior even in data where no signals were injected. 
Figure 5 shows the marginalized posterior distributions 
for both the sky location and the fT — In ^ plane when 
the data contained only simulated noise. Notice the con- 
centration of posterior weight at one localized sky loca- 
tion and frequency bin, despite the data being devoid 
of any injected signal. The MAP parameters recovered 
by the analysis were for a (false positive) source with 
SNR ^ 6 while the evidence ratio between this and the 
noise-only model was ~ 0.7. This strengthens our opinion 
that rigorous model selection steps are necessary, as even 
the analysis of noise-only data recovered a "believable" 
signal which was well localized in parameter space but 
easily rejected by our Bayes factor calculations. PDFs 
of higher signal-to-noise sources continue to demonstrate 
complicated secondary features in addition to the regions 
of parameter space near to the injected parameters. Fig- 
ure 2 (bottom panel) in § VA4 is an example of this 
multi- modality. 



C. Evaluation 

The model evidence for each SNR and each hypothesis 
{Aio and A^i) is calculated using the 20 chains from the 
fixed dimension exploration with the integral over /3 be- 
ing evaluated from 10~^ to 1. Figure 6 shows, for a single 
source and noise realization, the Bayes factor as a func- 
tion of increasing SNR. The source becomes "detectable" 
(^10 3) in the vicinity of the fiducial signal-to-noisc ra- 
tio of five threshold. Figure 7 shows the integrand from 
eq. 25 for injected signals of SNR = [0,8] (The signals 
were injected with amplitudes set using the definition 
SNR = ^y {h\h) which differs from Eq. 31. It is expected 
that the SNRs when calculated via Eq. 31 and \/(/i|/i) 
will differ (for data comprised of a finite number of fre- 
quency bins). The results of the Bayes factor vs. SNR 
studies use Eq. 31 to ensure that the interpretation of 
the SNR is the same between noise realizations.) 

From this figure we can learn much about thermody- 
namic integration, including how a maximum tempera- 
ture of 100 is clearly insufficient to capture all of the 
weight in the integrand. Running chains out to a more 
appropriate temperature (T ~ 1 x 10"*) with sufhciently 
dense spacing is inconvenient. This problem can be cir- 
cumvented, however, because at sufficiently high tem- 
perature the expectation value of the likelihood becomes 
independent of the injected signal. This manifests itself 
in the Figure 7 as the different curves become identical at 
lower (3. This is easily understood if one considers that 
the 1/T term in the likelihood decreases the effective SNR 
of the injected source by ~ 1/Vt. Because our analysis 
did not extend beyond sources with SNR = 10, a max- 
imum temperature of 100 (/3 = 0.01) sufficiently hides 
any contribution to the data by the gravitational wave. 



Therefore we can perform the evidence integration up to 
/3 = 0.01 once for each noise realization and only need to 
do the case by case study in the low temperature regime. 
To take advantage of this the analysis was repeated once 
for each noise realization with 20 chains spaced geometri- 
cally with [Tmin, Tmax] = [100, 10000]. The evidence ratio 
from this high temperature analysis is a common multiple 
which needs to be applied to the Bayes factors found for 
each SNR of that noise realization. The inset in Figure 7 
shows that the integrand has very nearly gone to zero 
at the maximum temperature of our analysis, signaling 
convergence of the integral. 



D. Cross Validation 

To verify the Bayes factor calculation from the ther- 
modynamic integration the post burn-in samples from 
the T = 1 chain are sorted into the signal-model hyper- 
cube that will be used for the trans-dimensional proposal 
distribution in the RJMCMC. We experimented with a 
variety of different weightings for the sampled and uni- 
form portions of the proposal, and found little variation 
in the resultant Bayes factors. Because the result is sup- 
posed to be independent of the proposal distribution this 
serves as a cheap test to ensure stability and convergence. 
For production-level runs the relative weighting between 
the two contributions to the proposal distribution was 
w = 0.5 although different values of w were used in test- 
ing to help verify that the proposal distribution was not 
biasing the results. The first 10^ RJ iterations were burn 
in samples, after which the samples were assumed to be 
drawn from the model posterior. Figure 8 shows the 
achieved agreement between the TI and RJMCMC cal- 
culations of the Bayes factor as a function of increasing 
SNR. 



E. Dependence on Priors 

As discussed in § V A 2 the analysis was repeated adopt- 
ing an amplitude threshold such that the minimum al- 
lowed template SNR was 5. Although the SNR at 
which the signal became detectable was unchanged, the 
restrictive prior volume did allow the analysis to defini- 
tively prefer the noise model over the signal model when 
the injected signal was well below this threshold. This is 
in opposition to the examples which allowed an arbitrar- 
ily small amplitude, which caused the two models to be 
indistinguishable at very low SNR despite the additional 
degrees of freedom of the signal model. Figure 9 shows 
the Bayes factor as a function of SNR for the two cases 
with the red (solid) line depicting the results with the 
uninformative prior while the blue (dashed) line resulted 
from the constrained amplitude prior case. What should 
be taken from this demonstration is how the results of 
Bayesian methods depend upon what is being asked of 
the data. The two questions asked here, "Docs the data 
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FIG. 5: Marginalized log posterior densities in the 9 — (j) (left) and the In^ — /Tobs (right) plane for data containing only 
stationary-gaussian noise. Although this data contained no gravitational wave signal the PDFs show organized locations in 
parameter space that look like GW signals. The evaluation step of the analysis easily discarded these potential detections 
returning a Bayes factor of ~ 0.7. The 6 — (j) PDF is shown in a Molweide projection on the sky. 
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FIG. 6: Thermodynamic integration results for Bio on data 
with signals injected at increasingly higher signal-to-noise ra- 
tio. Error bars were established by starting the chains with 
different random number generator seeds. The horizontal line 
is the Bayes factor where one would consider the result a pos- 
itive detection. 



contain any gravitational waves?" and "Does the data 
contain gravitational waves above a certain amplitudeT'' 
have very different interpretations and thus result in very 
different responses from the analysis. This is a feature 
(rather than a flaw) of Bayesian methods that underlines 
the role and importance of ones a priori beliefs about the 
experiment. 



IX. DISCUSSION 

We have implemented and tested an end-to-end 
Bayesian analysis algorithm which proved successful at 
solving the detection problem for a single galactic binary 
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FIG. 7: Integrand for thermodynamic integration on data 
with signals injected at increasingly higher SNR. Of im- 
portance is how the curves become indistinguishable below 
/3 ~ 0.01 [inset]. The horizontal line marks the regimes where 
the integrand supports Aii (above) or A^o (below). The in- 
set shows that the integrand has sufficiently vanished at the 
maximum temperature analyzed. 



signal in simulated LISA data with stationary, gaussian 
noise. We see our approach as a first step towards the ul- 
timate goal of generically solving the detection problem 
in a fully Bayesian manner for realistic data (i.e., non- 
stationary, non-gaussian noise and source confusion). 

Our algorithm single-handedly locates the regions in 
parameter space of high posterior weight (search phase), 
thoroughly samples the distribution about these loca- 
tions (characterization phase), and quantifies the con- 
fidence of the detection (evaluation phase). All three 
phases of the detection problem have been successfully 
completed using a single routine which resolves the PDF 
and calculates the evidence for each of the models un- 
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FIG. 8: Thermodynamic integration and RJMCMC results 
for Bio on data with signals injected at increasingly higher 
signal-to-noise ratio. The horizontal line is the Bayes factor 
where one would consider the result a positive detection. 
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FIG. 9: Thermodynamic integration results for Bio on data 
with signals injected at increasingly higher signal-to-noise ra- 
tio. The horizontal line is the Bayes factor where one would 
consider the result a positive detection. The red (solid) line 
is for the analysis done with uniform priors in the signal pa- 
rameters (apart form /). The blue (dashed) line is the same 
data analyzed with a restrictive prior on amplitude such that 
the minimum Amplitude corresponds to a SNR of 5. See 
§VA2 for more explanation. 



der consideration. The construction of such an approach 
is relatively simple, and this simplicity will serve as the 
foundation for a reliable "one stop" solution to the de- 
tection problem. 

The search and characterization phases were simul- 
taneously accomplished through a parallel tempering 
MCMC analysis which located the regions of interest 
within a few hundred thousand iterations. The efficient 
exploration of the full posterior was clearly demonstrated 
as the results revealed a surprising amount of additional 
structure in the signal PDFs. This could conspire to 



"fool" a less sophisticated analysis by preferentially sam- 
pling from some secondary maxima, thus failing to prop- 
erly characterize the source. This could lead to a poor es- 
timation of astrophysical parameters, or worse, a missed 
or false detection. 

Our algorithm was not fooled thanks to the model se- 
lection analysis provided by the thermodynamic integra- 
tion across the PTMCMC chains. The performance of 
the model selection phase of the problem was verified 
with a RJMCMC analysis. It is our belief that thermo- 
dynamic integration has proven to be the most reliable 
technique, and is exceptionally appealing when we con- 
sider extending this algorithm to a more general GW de- 
tection problem. The implementation of the thermody- 
namic integration calculation is independent of the model 
dimension and should remain as robust for more compli- 
cated scenarios. 

Although we also had success with the RJMCMC al- 
gorithm, we caution that the RJMCMC mixing and con- 
vergence time was very sensitive to the construction of 
the algorithm, and is prone to "fooling" the user by ap- 
pearing to converge to a stationary distribution that is 
not the target distribution. The diffuculty, a well docu- 
mented one in the employment of the RJMCMC, is con- 
structing an efhcient trans-dimensional proposal distribu- 
tion. Although the proposal is not supposed to affect the 
end result of the analysis, we found that sub-optimally 
tuned proposals led to very poor inter-dimensional mix- 
ing which in turn leads to very slow convergence: Well 
beyond the 10^ iterations used to generate the displayed 
results. Once we adopted the PTMCMC-PDF proposal 
it was discovered through repeated testing that the chains 
were stable and rapid in their convergence and in good 
agreement with thermodynamic integration. Figure 8 
demonstrates the RJMCMC's robustness against differ- 
ent starting seeds under this efficient trans-dimensional 
proposal scheme. 

Proving that we are recovering an unbiased set of sam- 
ples from an unknown target distributions is a subtle, if 
not impossible, task. There is a standard suite of conver- 
gence tests to asses the stationarity of the chains, and the 
degree of correlation between chain samples. However, 
chains which are strictly sampling from a single mode 
of an otherwise multi-modal distribution can pass these 
tests making it difficult to quantify the global accuracy 
of the recovered distributions. 

To test for convergence of the PTMCMC and RJM- 
CMC chains we use variants of the Geweke convergence 
test (checking that early and late sub-samples of the 
chains produce the same distributions) and the Gelman 
and Rubin approach (comparing many repeated runs us- 
ing different random seeds - and hence widely dispersed 
starting locations) as shown in Figure 10. The Geweke 
test is a simple approach to seeing if a particular chain 
has reached stationarity, but as stated previously, chains 
that are trapped at local modes can pass this test. To 
asses convergence of the global structure of the poste- 
rior distribution we demand that the recovered PDFs are 
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qualitatively similar between multiple runs with widely 
seperated starting points. 

The comparison of Bayes factor results between TI and 
RJMCMC is not a fully independent consistency check, 
as the RJMCMC empoyed the fixed dimension PDFs 
derived from the PTMCMC runs as part of the trans- 
dimensional proposal distribution. While, in theory, the 
MCMC algrotihm will return the correct posterior distri- 
bution independent of the proposal distribution, in prac- 
tice, for any finite length run one may end up with a 
distribution that has been biased by the choice of pro- 
posal distribution. This bias was apparent when we ex- 
perimented with trans-dimensional proposals based on 
Gassian (Fisher Matrix) approximation to the fixed di- 
mension PDFs. 

To explore the dependence on the choice of trans- 
dimensional proposal distributions we performed multi- 
ple runs using different weightings for the two contribu- 
tions to the trans-dimensional proposal distribution de- 
scribed in §VIIA1. The RJMCMC results were found 
to be consistent for values of w in Eq. 30 between 0.3 
and 0.9. Taking w below 0.3 gave very poor acceptance 
rates and hence long convergence times, so we were not 
able to check for consistency at very low values of w. We 
also did not use w values above 0.9 as some fraction of 
the distribution has to cover the full prior volume. While 
these tests do not prove that our RJMCMC results are 
fully independent of the PTMCMC resuhs, the resuffs 
show no obvious dependence on the choice of proposal 
distribution. Furthermore, it is worth emphasizing that 
the fixed dimension proposal distribution was only used 
for trans-dimensional moves. Between trans-dimensional 
moves the chains were free to explore the parameter- 
space of their respective models using the full cocktail 
of fixed-dimension proposal distributions. Among these 
fixed-dimension proposals are large jumps (such as uni- 
form draws on the full prior range) which are included 
to facilitate the chain's global exploration. The inclu- 
sion of parallel tempering also helps to ensures that the 
chains are not "trapped" in regions favored by the trans- 
dimensional proposal distribution. 

To test further the model selection phase of the prob- 
lem we experimented (unsuccessfully) with constructing 
a Nested Samphng [21, 53] evidence calculator. Nested 
Sampling is receiving attention in the astrophysics and 
cosmology community (e.g. [55, 57]) and more recently in 
gravitational wave detection problems similar to the one 
considered here [8, 9]. Nested Samphng is an alternative 
to the MCMC-driven methods discussed previously. The 
algorithm stochastically explores the posterior by first 
drawing from the prior distribution with some fixed num- 
ber of samples. The "worst-fit" sample is replaced by one 
with higher likelihood (found using a hill climbing routine 
or even a MCMC algorithm) , while also reducing the vol- 
ume covered by the samples. As this procedure is iterated 
the samples become tightly clustered in the regions(s) of 
high posterior weight. The evaluation of the evidence 
reduces eq. 5 in §111 to a one-dimensional integral over 



the fraction of the prior volume contained within the it- 
eratively more restrictive likelihood constraint. A more 
detailed description of the step-by-step workings of the 
algorithm can be found in Ref . [53] . 

A straighforward implementation of the Nested Sam- 
pling algorithm gave results that varied greatly from run 
to run. In many instances one or more of the maxima 
of the posterior would be missed entirely, and different 
methods for estimating the fraction of the prior volume 
covered by the samples gave very different results. In an 
effort to improve the performance we used the PDFs de- 
rived from the PTMCMC algorithm to determine which 
regions of parameter space need to be included in the 
evidence calculation. Regions of parameter space not in- 
cluded in the signal-model hypercube (see §VII A 1) were 
assumed to give a negligible contribution to the evidence 
integral. The Nested Sampling algorithm was applied 
to each occupied cell, and the evidence in each cell was 
added together to give the total evidence for the model. 
While this trick ensured that the regions of high posterior 
weight were included in the integration, the results were 
still highly dependent on how we calculated the fractional 
volume covered by the samples. 

We have recently learned of a new Nested Sampling 
algorithm called MultiNest [56, 58] which provides a so- 
lution to the problem of computing the volume covered 
by the samples and a reliable technique for finding new 
samples with higher likelihoods that can handle multi- 
modal posteriors. The MultiNest algorithm has been 
sucessfully applied to high-dimension parameter estima- 
tion problems in gravitational wave data analysis [59, 60]. 
Even with this improved algorithm, it is necessary to 
first identify the rough location of the posterior modes 
in order to keep the computational cost down. In future 
work we hope to compare the evidence estimates from the 
MultiNest algorithm to those from TI and RJMCMC. 

When comparing the two model selection techniques 
that we have successfully implemented (TI and RJM- 
CMC), we found TI to be the simplest to implement and 
most robust in terms of convergence for a wide range of 
proposal distributions. In our experience the RJMCMC 
technique required very carefully designed proposal dis- 
tributions to achive reasonable acceptance rates for the 
trans-dimensinal moves, and the results can be biased by 
poor choices for the trans-dimensinonal proposals. 

By comparing the SNR at which a signal becomes de- 
tectable between different noise realizations the weakness 
of using fixed SNR detection thresholds becomes appar- 
ent. Figure 11 shows the Bayes factor as a function of in- 
jected SNR for three identical sources injected into three 
different noise realizations. The point at which a signal 
becomes detectable varies appreciably between noise re- 
alizations even for benign, stationary and Gaussian noise. 
Even under ideal circumstances the detectability of a 
source is dependent on the particular noise realization 
with which the source is competing, as opposed to a hy- 
pothetical ensemble of possible noise realizations. Pre- 
dictably, similar variation in source "detection thresh- 
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FIG. 10: Marginalized log posterior densities for sky location (left) and fTobs — In A (right) for two runs using different random 
number seeds for the Markov chains. The consistency between different runs is evidence that the chains have converged to 
target distribution. 
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FIG. 11: Thermodynamic integration results for Bio on data 
with signals injected at increasingly higher signal-to-noise ra- 
tio. The horizontal line is the Bayes factor where one would 
consider the result a positive detection. Different curves rep- 
resent different noise realizations but identical signal param- 
eters. This demonstrates that the detection "threshold" is 
sensitive to the particulars of the noise in the detector, even 
when the noise characteristics are identical (stationary, gaus- 
sian, with known spectral density). 



olds" comes from the details of the waveforms themselves, 
as can be seen in the Bayes factor vs. SNR plots of the 
different test signals (fig. 12). 

In this study we have ignored many real world com- 
plications. Absent a good noise model for LIGO- Virgo 
or LISA the results taken from this algorithm must be 
treated with caution. Like any calculational tool, the 
answers received are only as good as the models under 
comparison. The Bayes factors that we compute do not 
answer the question "is there a GW signal present in the 
data". Both models are imperfect because our descrip- 
tion of the LISA instrument noise is grossly over simpli- 
fied. Addressing the question we really want to answer 
- "is there a binary white dwarf signal in this stretch of 
data" - will require a more realistic noise model. That be- 
ing said, we feel that this work is a useful contribution to- 
wards providing a solution to the detection problem, par- 
ticularly in calculating the evidence for large-dimension 
models. 

In addition to our simplified noise modeling, a 256-bin 
segment of LISA data will likely contain several galac- 
tic binaries with varying degrees of overlap. To properly 
address the LISA-specific detection problem we would 
need to allow for an arbitrary number of sources and our 
algorithm would be responsible for sampling the model 
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FIG. 12: Thermodynamic integration and RJMCMC results 
for Bio on data with injected signals 2, and 3 (red & green and 
blue & magenta, respectively) injected at increasingly higher 
signal-to-noise ratio (see Table I. The horizontal line is the 
Bayes factor where one would consider the result a positive 
detection. This demonstrates that the detection "threshold" 
is sensitive to the details of the waveforms. 

posterior p{Mi,s) to determine the most likely mimber 
of sources, instead of the binary 0- or 1-source case. An 
example of this type of study made with sinusoidal wave- 
forms in simulated LISA data can be found in Rcf. [61]. 
The RJMCMC approach nicely generalizes to the "N or 
A'' -|- 1 sources" detection problem. One problem that has 
been faced while doing global galaxy fitting is the occa- 
sional tendency for the algorithm to fit a single galactic 
binary with two highly-overlapping waveforms. There are 
specific trans-dimensional moves, split/merge proposals 
[61], which are designed to resolve problems with over- 



lapping signals that can be applied the RJMCMC algo- 
rithms. The model evidence, with its built in Occam 
factor, will favor solutions where a single waveform tem- 
plate is used to model each signal. In future work we will 
extend our modeling to include multiple, overlapping sig- 
nals 

Our single-source toy problem is more akin to the 
ground-based application of model selection where GW 
events are expected to be rare. However, our toy problem 
is unrealistic because our treatment of the noise does not 
allow for non-stationarity or departures from Gaussian- 
ity. 

One possibility would be to treat the evidence as a de- 
tection statistic that can be computed from the data and 
calibrated using signal injections and time slides. In- 
stead, we would prefer to "calibrate" the evidence by 
more accurately modeling the noise. There are a variety 
of ways in which this could be done, and we are currently 
pursing several options, such as comparing the evidence 
between different noise models, or using parameterized 
noise models (that allow for tails, etc.) and finding the 
parameters that best fit the noise. While there is no way 
of knowing if the "correct" noise model has been found, 
the analysis in Ref. [16] suggests that a fairly wide class 
of noise models should give adequately promising perfor- 
mance. 
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